Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

نویسندگان

  • Hyunjik Kim
  • Yee Whye Teh
چکیده

Automating statistical modelling is a challenging problem that has far-reaching implications for artificial intelligence. The Automatic Statistician employs a kernel search algorithm to provide a first step in this direction for regression problems. However this does not scale due to its O(N) running time for the model selection. This is undesirable not only because the average size of data sets is growing fast, but also because there is potentially more information in bigger data, implying a greater need for more expressive models that can discover finer structure. We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm, to encompass big data within the boundaries of automated statistical modelling.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Structure Discovery in Regression using Gaussian Processes

Automatic Bayesian Covariance Discovery (ABCD) in Lloyd et al. (2014) provides a framework for automating statistical modelling as well as exploratory data analysis for regression problems. However ABCD does not scale due to its O(N) running time for the kernel search. This is undesirable not only because the average size of data sets is growing fast, but also because there is potentially more ...

متن کامل

Covariance Kernels for Fast Automatic Pattern Discovery and Extrapolation with Gaussian Processes

Truly intelligent systems are capable of pattern discovery and extrapolation without human intervention. Bayesian nonparametric models, which can uniquely represent expressive prior information and detailed inductive biases, provide a distinct opportunity to develop intelligent systems, with applications in essentially any learning and prediction task. Gaussian processes are rich distributions ...

متن کامل

GPatt: Fast Multidimensional Pattern Extrapolation with Gaussian Processes

Gaussian processes are typically used for smoothing and interpolation on small datasets. We introduce a new Bayesian nonparametric framework – GPatt – enabling automatic pattern extrapolation with Gaussian processes on large multidimensional datasets. GPatt unifies and extends highly expressive kernels and fast exact inference techniques. Without human intervention – no hand crafting of kernel ...

متن کامل

The Automatic Statistician: A Relational Perspective

Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown timeseries data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite co...

متن کامل

Automatic Construction and Natural-Language Description of Nonparametric Regression Models

This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and naturallanguage text. Our approach treats unknown regression functions nonparametrically using Gaussian processes, which has two important...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.02524  شماره 

صفحات  -

تاریخ انتشار 2017